Internal Rewards Mitigate Agent Boundedness
نویسندگان
چکیده
Reinforcement learning (RL) research typically develops algorithms for helping an RL agent best achieve its goals—however they came to be defined—while ignoring the relationship of those goals to the goals of the agent designer. We extend agent design to include the meta-optimization problem of selecting internal agent goals (rewards) which optimize the designer’s goals. Our claim is that well-designed internal rewards can help improve the performance of RL agents which are computationally bounded in some way (as practical agents are). We present a formal framework for understanding both bounded agents and the meta-optimization problem, and we empirically demonstrate several instances of common agent bounds being mitigated by general internal reward functions.
منابع مشابه
Reinforcement Learning with Internal Reward for Multi-Agent Cooperation: A Theoretical Approach
This paper focuses on a multi-agent cooperation which is generally di cult to be achieved without su cient information of other agents, and proposes the reinforcement learning method that introduces an internal reward for a multi-agent cooperation without su cient information. To guarantee to achieve such a cooperation, this paper theoretically derives the condition of selecting appropriate act...
متن کاملA Deep Q-Learning Agent for the L-Game with Variable Batch Training
We employ the Deep Q-Learning algorithm with Experience Replay to train an agent capable of achieving a high-level of play in the L-Game while selflearning from low-dimensional states. We also employ variable batch size for training in order to mitigate the loss of the rare reward signal and significantly accelerate training. Despite the large action space due to the number of possible moves, t...
متن کاملExploration for Agents with Different Personalities in Unknown Environments
We present in this paper a personality based architecture (PDA) that combines elements from the subsumption architecture and reinforcement learning to find alternate solutions for problems facing artificial agents exploring unknown environments. The underlying PDA algorithm is decomposed into layers according to the different (non-contiguous) stages that our agent passes in, which in turn are i...
متن کاملPerceptual Reward Functions
Reinforcement learning problems are often described through rewards that indicate if an agent has completed some task. This specification can yield desirable behavior, however many problems are difficult to specify in this manner, as one often needs to know the proper configuration for the agent. When humans are learning to solve tasks, we often learn from visual instructions composed of images...
متن کاملAdaptive Control for Multiple Cooperative Robot Arms
,2= Abstract In this paper, we address the control problem of multiple robots manipulating a load cooperatively. First we propose a controller that ensures the asymptotic convergence of the load position and the internal forces to their desired values. Next we propose an adaptive control scheme for the multi-robot system. The adaptive controller ensures the asymptotic convergence of the load po...
متن کامل